-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let's fix the tests enough to run with AddressSanitizer and UB Sanitizer and enable those in CI #1151
Merged
tony-josi-aws
merged 38 commits into
FreeRTOS:main
from
anordal:build-unittests-with-sanitizers
Jun 14, 2024
Merged
Let's fix the tests enough to run with AddressSanitizer and UB Sanitizer and enable those in CI #1151
tony-josi-aws
merged 38 commits into
FreeRTOS:main
from
anordal:build-unittests-with-sanitizers
Jun 14, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@anordal Thanks for taking time to contribute to FreeRTOS+TCP. |
anordal
force-pushed
the
build-unittests-with-sanitizers
branch
4 times, most recently
from
June 4, 2024 09:32
f5064bf
to
06479ce
Compare
@anordal can you help fix the merge conflict with the |
These build options affect both the tests and the code under test when built from the unit-test CMake file. Example: cmake -DSANITIZE=address,undefined To reset all options: cmake --fresh Meson users will find this familiar: meson -Db_sanitize=… (When in doubt in CMake, implement what Meson provides out of the box.) Motivation: ASan and UBSan currently finds a lot of crashy problems with the unit-tests, and makes them visible in plain sight.
Let's not override optimization options: This is surprising when the cmake user tries to set CMAKE_BUILD_TYPE=(Debug|Release)'. The -Wno-div-by-zero warning disabling seems obsolete: Replacing it with -Werror did not fail, at least with Gcc 13.
I don't know why I get to resolve these, but in all cases, it is FreeRTOS_Sockets.c that is dragging in a dependency on xTCPWindowLoggingLevel, causing a few tests to fail to link: FreeRTOS_Sockets.c:5118:(.text+0x18fa2): undefined reference to `xTCPWindowLoggingLevel' Since it's one external variable, let's add it to the necessary unittests. Also under the headline of extern variables: The IPv6 address, which was not there for linkage, could be made const.
…d recvfrom Symptom: test_vDHCPProcess_eWaitingOffer_CorrectState_ValidBytesInMessage_MatchingEndPoint() segfaults. What AddressSanitizer says about that: test/unit-test/build/Annexed_TCP_Sources/FreeRTOS_DHCP.c:1139:28: runtime error: member access within null pointer of type 'const struct DHCPMessage_IPv4_t' AddressSanitizer:DEADLYSIGNAL ================================================================= ==14403==ERROR: AddressSanitizer: SEGV on unknown address 0x0000000000ec ==14403==The signal is caused by a READ memory access. ==14403==Hint: address points to the zero page. #0 0x456eb7 in prvIsValidDHCPResponse test/unit-test/build/Annexed_TCP_Sources/FreeRTOS_DHCP.c:1139 FreeRTOS#1 0x4584c3 in prvProcessDHCPReplies test/unit-test/build/Annexed_TCP_Sources/FreeRTOS_DHCP.c:1280 FreeRTOS#2 0x45038c in xHandleWaitingOffer test/unit-test/build/Annexed_TCP_Sources/FreeRTOS_DHCP.c:334 FreeRTOS#3 0x45366a in vDHCPProcessEndPoint test/unit-test/build/Annexed_TCP_Sources/FreeRTOS_DHCP.c:735 FreeRTOS#4 0x44fe57 in vDHCPProcess test/unit-test/build/Annexed_TCP_Sources/FreeRTOS_DHCP.c:263 FreeRTOS#5 0x418d2c in test_vDHCPProcess_eWaitingOffer_CorrectState_ValidBytesInMessage_MatchingEndPoint test/unit-test/FreeRTOS_DHCP/FreeRTOS_DHCP_utest.c:147 Diagnosis: pxDHCPMessage in prvProcessDHCPReplies() is the unlucky null pointer. As commented, it is expected to be set as an out-arg of FreeRTOS_recvfrom() due to calling it with FREERTOS_ZERO_COPY, but the condition for it in the mocked FreeRTOS_recvfrom() is that the sum of all flags is FREERTOS_ZERO_COPY + FREERTOS_MSG_PEEK. Finding the right fix: Should we add a null check? Nope. Set the FREERTOS_MSG_PEEK flag? Nope. The mocked function did not check the FREERTOS_ZERO_COPY flag properly. Observe that in the real FreeRTOS_recvfrom(), specifically inside prvRecvFrom_CopyPacket(), the condition for setting the zero-copy pointer into the buffer with the data depends only on one flag - FREERTOS_ZERO_COPY - and ignores the rest. It is obviously important that the mocked condition is exactly the same.
The pointer was used before initialized. If it happened to be NULL, the test would segfault.
The tested functions intentionally expect there to be bytes before the ethernet buffer: * test_FreeRTOS_GetUDPPayloadBuffer_*(): The code under test, FreeRTOS_GetUDPPayloadBuffer_Multi, writes 6 bytes before the ethernet buffer. This looks intentional, as the write is commented as doing that. * FreeRTOS_IP_utest: The code under test, prvProcessIPPacket() intentionally writes a byte at offset -ipIP_TYPE_OFFSET into its ethernet buffer. I am thankful for the generous comment about the ipIP_TYPE_OFFSET.
…e to struct interposing The test was crashing due to what AddressSanitizer calls a buffer overflow, or really, interposing a TCPSegment_t on top of a TCPWindow_t::xRxSegments member and accessing an interposed struct member that fell outside the underlying TCPWindow_t struct. The naive fix - not doing that - works: void test_vTCPWindowDestroy_list_length_not_zero( void ) { TCPWindow_t xWindow = { 0 }; - List_t * pxSegments = &( xWindow.xRxSegments ); + TCPSegment_t xSegment = { 0 }; listLIST_IS_INITIALISED_ExpectAnyArgsAndReturn( pdFALSE ); listLIST_IS_INITIALISED_ExpectAnyArgsAndReturn( pdTRUE ); listCURRENT_LIST_LENGTH_ExpectAnyArgsAndReturn( 1 ); - listGET_OWNER_OF_HEAD_ENTRY_ExpectAnyArgsAndReturn( pxSegments ); + listGET_OWNER_OF_HEAD_ENTRY_ExpectAnyArgsAndReturn( &xSegment ); /* ->vTCPWindowFree */ - uxListRemove_ExpectAnyArgsAndReturn( pdTRUE ); - uxListRemove_ExpectAnyArgsAndReturn( pdTRUE ); listCURRENT_LIST_LENGTH_ExpectAnyArgsAndReturn( 0 ); vTCPWindowDestroy( &xWindow ); } However, this became a different test, as evidenced by the less than 100% line coverage, that two function call expectations had to go, and that it functionally became an exact copy of the next test. To reach the holes in the test coverage opened by the naive fix, the two list items' container pointers also needed and sufficed to be set.
This test was using the stack of a previously returned function (probably a previous test). Highlights from AddressSanitizer output: ==15832==ERROR: AddressSanitizer: stack-use-after-return READ of size 8 at 0x7fdefb013670 thread T0 #0 0x4325bf in eARPGetCacheEntryByMac source/FreeRTOS_ARP.c:930 FreeRTOS#1 0x421a71 in test_eARPGetCacheEntryByMac_OneMatchingEntry (test/unit-test/build/bin/tests/FreeRTOS_ARP_utest+0x421a71) Address 0x7fdefb013670 is located in stack of thread T0 at offset 624 in frame #0 0x41f941 in test_vARPRefreshCacheEntry_IPAndMACInDifferentLocations1 (test/unit-test/build/bin/tests/FreeRTOS_ARP_utest+0x41f941) This frame has 2 object(s): [48, 54) 'xMACAddress' (line 1937) [80, 640) 'xEndPoint' (line 1941) <== Memory access at offset 624 is inside this variable Nulling the dangling pointer is enough to fix the test, but in order to keep the 100% line coverage, it must point at somewhere valid. Therefore doing that.
This expression is obviously undefined when ucRepCount is 0 (leftshift by ~0): 3000U << ( ucRepCount - 1U ) Which is fine if that is impossible. But is it? This case is handled later by clamping the result from 0 to 1 (which hints at how this accidentally works), and this is being tested for (in FreeRTOS_TCP_IP_utest.c:: test_prvTCPNextTimeout_ConnSyn_State_Active_Rep0). I'm also surprised that neither Gcc or Clang optimizes the UB away (which would make the code behave differently with optimization): 1500U << ucRepCount It is very tempting to apply this fix, but 1ms is very different from 1500ms. That may well speak more for lowering the scale factor than making exceptions, though. But not now: For the purpose of fixing sanitizer failures, let's preserve the behaviour for now.
The struct used as ethernet buffer did not contain the supposed data. The supposed data, however, seems to be correct based on this resource: https://support.huawei.com/enterprise/en/doc/ EDOC1100174721/8ebcb3c3/icmpv6-router-advertisement-ra-message AddressSanitizer called it a buffer overflow just because the buffer happened to be shorter than the supposed data. To make this evident and let type safety prevent this from compiling the wrong way, let's define a struct that contains the right data, and take pointers from the addresses of members instead of casting and doing manual offset calculations as far as possible. Also remove unused variables. I also wonder if the first test is not a subset of the second. It causes a subset of things to happen in the code under test, and their names only differ by a typo.
… use before initialization
Symptom: AddressSanitizer: dynamic-stack-buffer-overflow on address 0x7ffc9dfa5c07 READ of size 1 at 0x7ffc9dfa5c07 thread T0 #0 0x459a49 in prvProcessDHCPReplies test/unit-test/build/Annexed_TCP_Sources/FreeRTOS_DHCP.c:1310 FreeRTOS#1 0x4526d2 in vHandleWaitingAcknowledge test/unit-test/build/Annexed_TCP_Sources/FreeRTOS_DHCP.c:495 FreeRTOS#2 0x4544ef in vDHCPProcessEndPoint test/unit-test/build/Annexed_TCP_Sources/FreeRTOS_DHCP.c:739 FreeRTOS#3 0x43dbd9 in test_vDHCPProcess_eWaitingAcknowledge_DNSIncorrectLength2 (test/unit-test/build/bin/tests/FreeRTOS_DHCP_utest+0x43dbd9) Address 0x7ffc9dfa5c07 is located in stack of thread T0 SUMMARY: AddressSanitizer: dynamic-stack-buffer-overflow test/unit-test/build/Annexed_TCP_Sources/FreeRTOS_DHCP.c:1310 in prvProcessDHCPReplies Shadow bytes around the buggy address: 0x7ffc9dfa5980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x7ffc9dfa5a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x7ffc9dfa5a80: 00 00 00 00 00 00 00 00 ca ca ca ca 00 00 00 00 0x7ffc9dfa5b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x7ffc9dfa5b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x7ffc9dfa5c00:[05]cb cb cb cb cb cb cb 00 00 00 00 00 00 00 00 0x7ffc9dfa5c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x7ffc9dfa5d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x7ffc9dfa5d80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x7ffc9dfa5e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x7ffc9dfa5e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Left alloca redzone: ca Right alloca redzone: cb There were two problems that conspired to create this segfault. The first was allowing the option parser to run off the end of the buffer at all: /* ulGenericLength is incremented by 100 to have uxDNSCount > ipconfigENDPOINT_DNS_ADDRESS_COUNT scenario */ ulGenericLength = sizeof( DHCPMsg ) + 100; The second problem was letting it overshoot the stop byte (0xFF). Which is a problem with having manually updated indexes and length fields. The stop byte was at the end of the buffer, but was of no help, because the buffer length was off by -3 (missing 2 bytes for the opcode and length field of the 6 server addresses and 1 byte to account for an unexplained hole in the serialized stream). The real fix for this kind of fragility is using some helper funcitons for serializing the data while keeping indexes and lenghts consistent (not to mention collapsing repeated lines 8-fold). Anyway, it is trivial to add a check that the serialized stream ends at the end of the buffer (done). Whether to add a functioning stop byte does not matter and should not be needed anymore with such a check. I initially fixed it the wrong way, by keeping it within the same buffer, which hurt line coverage. But what the test wants to test is as commented: At least 6 server addresses, because that's the value of ipconfigENDPOINT_DNS_ADDRESS_COUNT + 1. No "invalid" length required, just an overabundance of DNS servers. As such, let's rename the test. Btw, the test was using a VLA (fixed), and most of the uint32_t writes are still unaligned (I replaced one of them with memcpy).
…name_SuccessAddressInCache
UBSan happens to catch this as a misaligned pointer deref within the null page one moment before segfault: source/FreeRTOS_DNS_Parser.c:761:49: runtime error: member access within misaligned address 0x00000000012c for type 'struct freertos_addrinfo', which requires 8 byte alignment This was traced back to the test test_parseDNSAnswer_dns_nocallback_false(), which creates an uninitialized pointer and passes it on. Other tests were also found doing the same, though did not lead to segfault on GCC 12 and 13, except did on GCC 11 in CI with AddressSanitizer. This is a class of error that a higher warning level could easily forbid (reading an uninitialized variable, pointer or not): -Werror=maybe-uninitialized
xProcessCheckOption() reads the length from the second byte. So for the purpose of testing reading the second byte, but no more, the buffer length was correctly given as 2, except that the buffer length must then be at least 2.
FreeRTOS_DNS_utest.c::test_FreeRTOS_gethostbyname_FailLongAddress: Array length vs index of last element: I was about to add one more byte to the buffer, but it looked like that had been attempted before without remembering to initialize them. Therefore, remove those bytes instead. FreeRTOS_Routing_utest.c::test_pcEndpointName_IPv{4,6}_HappyPath: Can be summed up as sizeof() != strlen(). These tests were copying one byte too many from their test input strings. Non-functional cleanups: * Let input strings have static storage duration (avoid copy to stack). * I found it confusing to take the address of the string constants, as it performs the same pointer decay as doing nothing.
Consider undoing this and see if the code under test needs fixing. LeakSanitizer finds these.
The pointer to the allocated memory was reset. ⏚
Symptom (UB Sanitizer): Store to misaligned address 0x7ffe* for type 'uint32_t', which requires 4 byte alignment When repeated, these 4-byte fields are 2 bytes apart (because of the option and length bytes). The padding byte added to each test does not solve this problem (consider removing). Should have used memcpy (done). Actually, one thing that makes memcpy tedious is that it takes an address, not a value. I got tired of memcpy halfway through; this is what I mean by helper functions (see the commit about deleting a buffer overflow): The ultimate solution is not memcpy, but helpers that remove those manual indexes and length fields, and with that, the possibility for inconsistencies that can lead to such a buffer overflow.
test_prvProcessICMPMessage_IPv6_NeighborSolicitationCorrectLen() required these fixes when compiled with -fsanitize=address,memory +usGenerateProtocolChecksum_IgnoreAndReturn( ipCORRECT_CRC ); +vReturnEthernetFrame_ExpectAnyArgs(); … but only this fix when compiled regularly: +usGenerateProtocolChecksum_IgnoreAndReturn( ipCORRECT_CRC ); Thankfully, the intention is clear from the comment. It fails extra with sanitization because the two compared IP addresses actually do compare equal. Which is fixable. But removing the test did not impact coverage.
…to lack of initialization
As commented, it had to be a separate build because branch coverage (currently) doesn't ignore artificial branches added by sanitizers. On reusing the same build directory: It's totally possible to use separate build directories in build/, but there is no correctness benefit (CMake rebuilds the object files whose recipe has changed anyway). Rather, CMake saves (130) jobs that don't need to run again when reusing the same build directory. On which builds to build and run first (aubsan before coverage): When it matters, which is when a test is crashing, that's generally when you want to see the AddressSanitizer output.
… it's not fun when it only fails in CI. The lookup happened to fail to fail with AddressSanitizer, but only on GCC 11 (not 12 of 13).
With Gcc 11 + AddressSanitizer, the mocked recvfrom would not return a NULL buffer (unlike Gcc 12 and 13 with and without sanitization). A custom stub function gave enough control to do that. The existing FreeRTOS_recvfrom_Generic_NullBuffer() stub did almost the same, but was unused and meaningless (failed to set its out-argument), so it could be replaced.
test_SendPingRequestIPv6_SendToIP_Pass(): This test segfaulted without AddressSanitizer: 'build/normal/bin/tests/FreeRTOS…' terminated by signal SIGSEGV test_SendPingRequestIPv6_Assert(): ==7143==AddressSanitizer CHECK failed: ../../../../src/libsanitizer/asan/asan_descriptions.cpp:80 "((0 && "Address is not in memory and not in shadow?")) != (0)" (0x0, 0x0) #0 0x7ff6c812f9a8 in AsanCheckFailed ../../../../src/libsanitizer/asan/asan_rtl.cpp:74 FreeRTOS#1 0x7ff6c815032e in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) ../../../../src/libsanitizer/sanitizer_common/sanitizer_termination.cpp:78 FreeRTOS#2 0x7ff6c809fa77 in GetShadowKind ../../../../src/libsanitizer/asan/asan_descriptions.cpp:80 FreeRTOS#3 0x7ff6c809fa77 in __asan::GetShadowAddressInformation(unsigned long, __asan::ShadowAddressDescription*) ../../../../src/libsanitizer/asan/asan_descriptions.cpp:96 FreeRTOS#4 0x7ff6c809fa77 in __asan::GetShadowAddressInformation(unsigned long, __asan::ShadowAddressDescription*) ../../../../src/libsanitizer/asan/asan_descriptions.cpp:93 FreeRTOS#5 0x7ff6c80a1296 in __asan::AddressDescription::AddressDescription(unsigned long, unsigned long, bool) ../../../../src/libsanitizer/asan/asan_descriptions.cpp:441 FreeRTOS#6 0x7ff6c80a3a84 in __asan::ErrorGeneric::ErrorGeneric(unsigned int, unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned long) ../../../../src/libsanitizer/asan/asan_errors.cpp:389 FreeRTOS#7 0x7ff6c812efc5 in __asan::ReportGenericError(unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned long, unsigned int, bool) ../../../../src/libsanitizer/asan/asan_report.cpp:476 FreeRTOS#8 0x7ff6c80abc44 in __interceptor_memset ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:799 FreeRTOS#9 0x55f2e38a3620 in FreeRTOS_SendPingRequestIPv6 build/u22/Annexed_TCP_Sources/FreeRTOS_ND.c:768 FreeRTOS#10 0x55f2e3893053 in test_SendPingRequestIPv6_Assert test/unit-test/FreeRTOS_ND/FreeRTOS_ND_utest.c:1065 FreeRTOS#11 0x55f2e389c5dd in run_test build/u22/FreeRTOS_ND_utest_runner.c:201 FreeRTOS#12 0x55f2e389ca84 in main build/u22/FreeRTOS_ND_utest_runner.c:252 FreeRTOS#13 0x7ff6c6bcbd8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) FreeRTOS#14 0x7ff6c6bcbe3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f) FreeRTOS#15 0x55f2e38873d4 in _start (build/u22/bin/tests/FreeRTOS_ND_utest+0x233d4 test_prvProcessICMPMessage_IPv6_NeighborSolicitationNullEP() behaved different with and without ASan on Gcc 11. Without AddressSanitizer on Gcc 11: FreeRTOS_ND_utest.c:1427:test_prvProcessICMPMessage_IPv6_NeighborSolicitationNullEP: FAIL:Function usGenerateProtocolChecksum. Called more times than expected.
Under Gcc 11, this expression in the tested function lTCPWindowTxAdd() was always true, leading to imperfect coverage: pxSegment->lDataLength < pxSegment->lMaxLength With Gcc 13, they were both 0. Let's add zero-initialization to make this what's tested for.
Yes. I see you fixed a use after free (ba6ba81). I'll push a rebase. |
anordal
force-pushed
the
build-unittests-with-sanitizers
branch
from
June 7, 2024 16:47
06479ce
to
88a73c4
Compare
shubnil
approved these changes
Jun 14, 2024
tony-josi-aws
approved these changes
Jun 14, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds an option to build with sanitizers, fixes all AddressSanitizer and UB-Sanitizer errors (because these sanitizers can be run together) and enforces them in CI.
Motivation:
The tests were crashing when compiled with Gcc 13. With these fixes, they no longer do.
I think it would be nice for future development to run the tests with sanitizers in CI to prevent the sort of bugs I've now fixed. They were not few, which should contribute to reduce the amount of flakiness (especially crashiness) in the tests.
I have focused on what's necessary to enable these sanitizers, but there is more. Particularly uninitialized variables. The next I would do is ban it with
-Werror=maybe-uninitialized
, but that's going to reveal the rest of that iceberg.Test Matrix
cmake -DSANITIZE=address,undefined
andcmake -DSANITIZE=
.Fixing line coverage with sanitizers is still more work, but branch coverage is currently not attainable (presumably due to artificial branches that get inserted but not ignored).
Checklist:
Related Issue
None.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.